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Abstract 

MYB proteins constitute one of the largest transcription factor families in plants. Recent evidence revealed 
that MYB-related genes play crucial roles in plants. However, compared with the R2R3-MYB type, little is 
known about the complex evolutionary history of MYB-related proteins in plants. Here, we present a 
genome-wide analysis of MYB-related proteins from 1 6 species of flowering plants, moss, Selaginella, and 
algae. We identified many MYB-related proteins in angiosperms, but few in algae. Phylogenetic analysis clas- 
sified MYB-related proteins into five distinct subgroups, a result supported by highly conserved intron pat- 
terns, consensus motifs, and protein domain architecture. Phylogenetic and functional analyses revealed 
that the Circadian Clock Associated 1-Iike/R-R and Telomeric DNA-binding protein-like subgroups are >1 
billion yrs old, whereas the l-box-binding factor-like and CAPRICE-Iike subgroups appear to be newly 
derived in angiosperms. We further demonstrated that the MYB-Iike domain has evolved under strong puri- 
fying selection, indicating the conservation of MYB-related proteins. Expression analysis revealed that the 
MYB-related gene family has a wide expression profile in maize and soybean development and plays important 
roles in development and stress responses. We hypothesize that MYB-related proteins initially diversified 
through three major expansions and domain shuffling, but remained relatively conserved throughout the 
subsequent plant evolution. 

Key words: MYB-related transcription factors; classification; evolution; phylogenetic analysis; expression profile 
analysis 



1 . Introduction 

MYB proteins are characterized by a conserved DNA- 
binding domain and constitute one of the largest fam- 
ilies of transcription factors (TFs) in plants, which are 
classified into four major groups according to the 
number of adjacent repeats in the DNA-binding 
domain. 1 All four groups are found in plants. The most 
common is the 2R-MYB group. The second group 



comprises a heterogeneous collection of R3- or 1 R- 
MYB type proteins, hereafter referred to as MYB- 
related proteins, which usually contain a single MYB 
repeat. The third and fourth groups are composed of 
3R-MYB and 4R-MYB type proteins, respectively. These 
latter groups consist of only 1 -5 members. 

The first plantMYB-encodinggene,C7 (2R-MYB),was 
isolated from maize (Zea mays). 2 Accordingly, research 
on MYB genes has mainly focused on the 2R-MYB 
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gene family because of its large size. 1 In the last two 
decades, a vast number of plant 2R-MYB genes have 
been shown to play important roles in many plant- 
specific processes. The first plant MYB-related gene 
(MybSH) was isolated from potato. 3 The numerous 
MYB-related genes subsequently identified play key 
roles as transcriptional regulators, 3,4 circadian clock- 
associated repressors, 5,6 and telomeric repeat-binding 
proteins 7,8 in diverse biological processes. To date, 
genome-wide analyses of 2R-MYB proteins have been 
conducted in numerous plant species based on 
sequenced genomes 1,9-11 However, comprehensive 
analyses of MYB-related proteins in major land plants 
are still lacking. Accordingly, the evolutionary relation- 
ships between plant MYB-related proteins remain 
unknown, necessitating a detailed survey and classifica- 
tion of disparate evolutionary groups. 

To understand the evolutionary history of plantMYB- 
related genes, we identified MYB-related proteins atthe 
genome-wide level and performed structural and evo- 
lutionary analyses across distantly related plant evolu- 
tionary lineages, including eudicots, monocots, a 
gymnosperm, a bryophyte, five chlorophyte species, 
and a rhodophyte. Subsequently, we assessed the 
origins, patterns of differentiation, and expansion of dif- 
ferent phylogenetic subgroups of this gene family. In 
addition, we analysed the expressions of MYB-related 
genes in different tissues and developmental stages 
and under stress treatments. 

2. Materials and methods 

2.1. Sequence retrieval 

We performed a BLASTP search among sequenced 
genomes of land plants in Phytozome (http://www. 
Phytozome.net) using well-known plant MYB-related 
proteins as queries. The species represented a broad 
range of the plant lineages from unicellular green 
algae to multicellular plants (http://www.jgi.doe.gov/). 
To verify the reliability of our results, all putative non- 
redundantsequenceswere assessed with PROSITE profil- 
ing 1 2 and SMART analysis, 1 3 respectively. 

2.2. Multiple sequence alignments 

Multiple alignments of MYB domains in candidate 
genes were performed using the MAFFT version 7 soft- 
ware underdefault parameters. 14 Nucleotide substitu- 
tion levels were calculated using the HyPhy version 
2.O. 1 5 The HyPhy batch Quick Selection Detion.bf was 
used to estimate site-by-site variation in rates. 

2.3. Phylogenetic analysis 

A neighbour-joining (NJ) tree was constructed using 
theMEGAversion 5 software, 1 6 based on the alignment 
of MYB domains. To determine the statistical reliability, 



we conducted bootstrap analysis with the following 
parameters: p-distance and pairwise deletion. 
Bootstrapanalysiswas performed with 1 000 replicates. 

2.4. Detection of conserved motifs 

Conserved motifs of MYB-related proteins were iden- 
tified statistically with the MEME program. 17 The fol- 
lowing parameter settings were used: maximum 
number of motifs, 1 00; minimum width of motif, 6; 
and maximum width of motif, 300. All putative motifs 
with expected values of <1E-30 were discarded. In 
addition, we used the PFAM tool to identify whether 
any remaining motifs matched well-known motifs. 18 

2.5. Gene expression analysis 

Maize and soybean public expression datasets were 
obtained from the Plant Expression Database 
(PLEXdb). 19 Additionally, maize and soybean micro- 
array-based datasets, with accession numbers 
GSE16567, GSE40052, GSE1 9501, GSE1 0023, 
GSE31188, GSE31 763, GSE15100, GSE35427, and 
GSE1 8827, were downloaded from the PLEXdb. A hier- 
archical cluster was created using the Cluster 3.0. 20 

3. Results and discussion 

3.1 . Identification of MYB-related proteins in plants 

To identify MYB-related proteins in land plants, we 
implemented BLASTP searches of the complete 
genomes of the red alga (Cyanidioschyzon merolae); 
the chlorophytes (Volvox carter!, Chlamydomonas 
reinhardtii, Ostreococcus tauri, Ostreococcus lucimarinus, 
and Chlorella vulgaris); the moss {Physcomitrella 
patens); the lycophyte (Selaginella moellendorffii); the 
eudicots (Arabidopsis thaliana, Citrus sinensis, Populus 
trichocarpa, Glycine max, Vitis vinifera L, and Solanum 
lycopersicum), and the monocots (maize and 
Brachypodium distachyon). Each matching sequence 
was then used to search the respective genome data- 
bases until no new sequences were found. 

Further analyses focused only on proteins with full- 
open reading frames. We referred to the sequences of 
MYB-related proteins in the Plant Transcription Factor 
Database (PlantTFDB) 21 and PlnTFDB, 22 and recon- 
firmed the sequences by comparative analysis. Given 
the substantial sequence divergence of MYB-related 
genes, their identification should be careful manual 
checking. GARP-like TFs are often confused with MYB- 
related TFs, 23 because they contain a consensus 
sequence (SHLQKY) very similar to that of CCA1-like 
proteins (SHAQK(Y/F)F). 4 However, they contain only 
1 of the 3 regularly spaced Trp (W) residues found in 
the MYB domain. 23,24 Thus, we excluded GARP-like 
TFs in this study. 
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After removing incomplete or redundant sequences, 
and predicted alternative splice variants, we identified 
62 3 MYB-related genes (Supplementary Table S1). 
We also identified a large number of putative MYB- 
related proteins in angiosperms, but only a small 
number in land plants that diverged earlier (Fig. 1). 
This suggests that a huge expansion occurred after the 
evolution of angiosperm plants. In this study, we 
excluded false positives of MYB-related proteins accord- 
ing to the Yanhui's criterion. 25 Interestingly, we 
retrieved four genes not previously annotated as MYB- 
related genes in Arabidopsis. In addition, we retrieved 
a small number of genes in unicellular green algae 
and red alga (Fig. 1 ), which suggests that MYB-related 
proteins arose before plants transitioned from water 
to land. 



3.2. Phylogenetic analysis of MYB-related proteins 

To i nvestigate the evolution of plant MYB-related pro- 
teins, we constructed an NJ tree (Fig. 2 and 
Supplementary Fig. S1 ) based on the alignment of the 
MYB domains. Based on the topology and clade 
support values, the 62 3 MYB-related proteins were 
classified into five major subgroups with robust boot- 
strap support (generally >60%), CCA1 -like/R-R, I- 
box-like,CPC-like,TRF-like, andTBP-like (Fig. 2). 

The CCA1 -like/R-R and TBP-like subgroups were the 
largest of the five subgroups. All subgroups were 
present in monocots and eudicots (Fig. 2), indicating 
that the appearance of most MYB-related genes in 
plants predates the divergence of monocot /eudicots. 
Meanwhile, in contrast to 2R-MYB genes, 10,11 no 
species-specific subgroups and/or clades were 
observed, implying that MYB-related genes were more 



conserved during evolution. In addition, MYB-related 
genes from the same lineage tended to cluster together 
in the phylogenetic tree and were not equally repre- 
sented within a given clade,suggestingthatthey experi- 
enced duplications after the lineages diverged. 
Additional features used for validation, discussed 
below, strongly supported the reliability of the cluster- 
ing results. 

The CCA1 -like/R-R subgroup comprised of four 
major clades (Clades I — IV) with different intron pat- 
terns (a-d; Figs 2 and 3). The common characteristic 
of this subgroup is a highly conserved motif, 
SHAQK(Y/F)F, 4 in the third helix of the MYB domain. 
Whereas most of the plant MYB-related proteins 
contain a single MYB domain, R-R proteins, similar to 
2R-MYB proteins, have two MYB domains. However, 
unlike in 2R-MYB proteins, the two MYB repeats in R-R 
proteins are separated, in the N-terminal and middle 
regions, respectively. The second repeats are more 
closely related to the MYB domains of CCA1 -like pro- 
teins, and they clustered as the second clade of the 
CCA1 -like/R-R subgroup. The TBP-like subgroup con- 
tained five distinct clades (Fig. 2, Clades l-V) with con- 
served characteristics. Although the bootstrap value of 
the TBP-like subgroup node was low, the reliability of 
the clustering was supported by the presence of the 
consensus motif LKDKW(R/K)(N/T), 26 the intron pat- 
terns, and the architectures of the non-MYB motifs 
(Fig. 2). The TRF-like subgroup comprised of a limited 
genes from all land plants investigated, except for 
Selaginella. This suggests that the TRF-like subgroup 
was also conserved during evolution. The CPC-like sub- 
group consisted of two distinct clades (Fig. 2, 1 and II); 
one contained angiosperm genes, and the other con- 
tained algae genes. Interestingly, members of the 



Species 



Number of MYB 



Cyanidioschyzon mcrolac 5 
Ostrcococcus lucimarinus 9 

Ostrcococcus tauri 1 2 
Chlorclla vulgaris 5 

Chlamydomonas rcinhardtii 20 

Volvox cartcri 6 

Physcomitrella patens 27 

Selaginella moellendorflii 12 

Zea mays 72 

Brachypodium distachyon 44 

Solatium lycopcrsicum 60 

Vitis vinifcra 47 

Arabidopsis lhaliana °" 

Citrus sinensis 38 

Glycine max 127 

Populus trichocarpa 71 



Figure 1 . Phylogenetic relationships between all species investigated in this study. The total number of MYB-related proteins found in each 
genome is indicated on the right. 
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Eudicots 



Figure 2. NJ analysis of 62 3 plant MYB-related proteins. The proteins clustered into five major subgroups, CCA1 -like/R-R, l-box-like,CPC-Iike, 
TRF-like, and TBP-like. The numbers beside the branches represent bootstrap support values from 1 000 replications. The coloured lines 
indicate the intron pattern as shown in Fig. 3. The coloured dots symbolize the species to which the proteins in each clade belong. The 
major clades of each subgroup are numbered consecutively. 



plant-only clade were characterized by very short 
sequences without transcription-activation domains. 
The lack of moss and lycophyte proteins in this clade 
suggests that either these genes were lost in early 
diverged land plants during expansion or were evolved 
after gymnosperms diverged. The high bootstrap 
values for the node supported that the two clades 
likely clustered in a subgroup. 

In our phylogenetic analysis, most of the MYB-related 
genes fell into a subgroup, except for AtMYBR48, which 
was classified as an orphan gene. 

3.3. Conserved characteristics in the MYB domain 

Alignment analysis revealed the MYB domains of the 
five subgroups are remarkable divergence, but are 
highly similar within each subgroup or clade (Fig. 3). 
Similar to 2R-, 3R-, and 4R-MYB proteins,' 0 MYB- 
related proteinsalso contained the three evenly distribu- 
ted Trp (W) residues characteristic of MYB repeats. 
However, 1 3% and 65% of the plant MYB-related genes 
had a substitution at either the first or the third Trp (W) 
residue in the MYB domain, respectively. Most members 
of the TRF-like and TBP-like subgroups contained the 
three W residues. In contrast, the third W residue was 
often substituted by Ala (A) and Tyr (Y) in CCA1-like/ 
R-R and l-box-like subgroups, respectively (Fig. 3). While 
the first W residue was substituted by Phe (F) in all 
members of the fourth clade of the TBP-like subgroup 
and in most members of the CPC-like subgroup. 



Interestingly, despite the divergence of the individual 
MYB domains, the consensus sequences SHAQK(Y/F)F 
and LKDKW(R/K)(N/T) were highly conserved in the 
MYB domains of the CCA1 -like/R-R and TBP-like sub- 
groups, respectively (Figs 2 and 3), thus providing 
unique criteria for identifying these types of MYB pro- 
teins. The existence of highly conserved, subgroup- 
specific sites in the MYB domains also indicates a 
common origin, despite variability among the different 
subgroups. 

The third helix in the MYB domain plays a major role 
in recognizing c/s-elements in target genes, whereas 
the conserved W residues are important for forming 
the hydrophobic core and maintaining the three- 
dimensional structure of the MYB repeat. This suggests 
that the molecular structures and biological functions 
of each subgroup and/or clade were highly conserved 
during evolution. 

3.4. Conservation of intron /exon structure within 
MYB domains 

To determine the intron patterns of MYB-related 
genes, we analysed the intron distribution in regions en- 
coding MYB domains. Most of the plant MYB-related 
genes (~87%) were disrupted by intron(s), with up to 
two introns. In contrast, ~1 3% of MYB-related genes 
did not contain introns (Fig. 3, l-box-like subgroup). 

Our results revealed thatthe intron patterns (Fig. 3a-k), 
formed by relative position and phase, were highly 
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Figure 3. Sequence logos of the MYBdomains of plant MYB-related proteins. The bit score indicates the information content for each position in 
the sequence. Asterisks indicate conserved Trp (W) resid ues in the MYB domain. Dots indicate the conserved motif (DLx2Rx3Lx6Lx3R). Black 
boxes indicate conserved motifs in the MYBdomains. The intron patterns of land plant MYB-related genes are denoted a-k. White triangles 
indicate the locations of introns, and the number within each triangle indicates the splicing phases of introns. The corresponding clades in 
the NJ tree (Fig. 2) are listed on the right, for reference. 



conserved in all subgroups and/orcladesof plant MYB- 
related genes. The highly conserved intron patterns 
within subgroups or clades provided an independent 
criterion for testing the reliability of our phylogenetic 
analysis (Fig. 2). The intron patterns in algae were 
not conserved and generally quite different from 
those in land plants. However, an algae CCA1-like 
gene, VcMYBR05, showed the same intron pattern 
(Fig. 3c) as that in the third clade of the CCA1-like/ 
R-R subgroup, strongly supporting their common origin. 

In addition, the intron phases were highly conserved 
in plant MYB-related genes within each subgroup. 
For instance, pattern (a) always was in Phase 1, while 
pattern (d) was consistently in Phases 2 and 0 (Fig. 3), 
resulting in a significant excess of non-symmetrical 
exons. This suggests that splicing phases were also 
highly conserved during the evolution. Overall, our 
results indicate a strong correlation between the phyl- 
ogeny and exon/intron structure of the MYB-related 
gene family. 



3.5. Molecular evolution of plant MYB-related genes 

To analyse the selective pressures acting during the 
expansion of plant MYB-related genes, we investigated 
the influences of selective constraints on the MYB 
domains. By globally fitting an evolutionary model, we 
first calculated the dN/dS ratios for each subgroup. 
The dN/dS values were substantially <1 in all sub- 
groups, providing a crude indication that the strong 
purifying selection has been maintained across land 
plants (Supplementary Table S2). At the individual 
codon level, most of the residues were under significant 
negative selection (P< 0.05). 

Because the CCA1 -like/R-R and TBP-like sub- 
groups subdivided into several clades (Fig. 2), the 
preceding method merely estimated the dN/dS 
ratio across each subgroup, without considering var- 
iations among clades in the large subgroups. 
Therefore, we estimated the dN/dS ratios for the 
clades of the CCA1 -like/R-R and TBP-like subgroups 
(Supplementary Table S2). 
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In general, the dN/dS values of individual cladeswere 
lower than that of the subgroups. However, the dN/dS 
values of some individual clades were higher than that 
of the subgroups, and the number of residues under sig- 
nificant negative selection was reduced (P<0.05). 
However, no clades showed dN/dS values > 1 , suggest- 
ing that different clades were subjected to different 
strengths of purifying selection. For example, in TBP- 
like subgroup clades, the dN/dS values ranged from 
0.06 to 0.32, while in CCA1 -like/R-R subgroup clades, 
thedN/dS values were <0.1 1 .Thus,ourdN/dSanalysis 
suggeststhat selective constraints have remained stable 
throughout the evolution of MYB-related genes in 
land plants. 

3.6. Distribution ofMYB domains and non-MYB 
motifs in plants 

The MYB domains were found throughout the entire 
coding region of MYB-related proteins, even within dif- 
ferent clades of a subgroup (Fig. 4). For example, within 
the TBP-like subgroup, the MYB domain is at the N- 
terminal region inthesecondcladeandattheC-terminal 
region in the third clade. Similarly, the MYB domains of 
the CCA1 -like/R-R subgroup are located either at the 
N-terminal or at the middle region. Thus, the location 
of MYB domains is less conserved in MYB-related proteins 
than in 2R-MYB proteins. 11 These results illustrate the 
variability in the relative locations of the MYB domain 
and the high divergence of MYB-related proteins. 

Because sequences outside of the MYB domains are 
quite divergent, non-conserved subgroup-specific 
motifs were detected. However, we identified 34 
clade-specific motifs in the CCA1 -like/R-R, TRF-like, 
and TBP-like subgroups (Supplementary Table S3). No 
motifs were detected in the CPC-like or l-box-like sub- 
group, because they lack the C-terminal (Fig. 4). 
Motifs 1 0, 1 1 , 1 3, 1 4, and 1 6 in the CCA1 -like/R-R 
subgroup and motifs 19, 21 , 29, 30, 31 , and 33 in 



CCA 1(1) 
R-R(II) 
CCA 1 (III) 
CCA 1 (IV) 
I-box 
CPC 
TRF -417118 
TBP(I) 
TBP(II) 
TBP(III) 
TBP(IV) 
TBP(V) 
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Figure 4. Architecture of conserved protein motifs in plant MYB- 
related subgroups and/or clades. An idealized representation of a 
typical member of each clade is shown, with the MYB domain 
and conserved motifs drawn as numbered boxes. The diagrams 
are not drawn to scale. 



the TBP-like subgroup were found only in angiosperms, 
suggesting that they are angiosperm-specific motifs 
that originated after the evolution of angiosperms. 
Motifs 1 ,9, and 34 were adjacent to MYBdomains,indi- 
catingthatthey co-evolved with thecorrespondingMYB 
domain (Fig. 4). In addition, motifs 1, 9, and 23 were 
present in most of the chlorophyta and/or red algae 
MYB-related proteins, suggesting that they are 
ancient. Overall, the protein architectures of closely 
related members in a specific clade were remarkably 
conserved, indicating a common origin and/or close 
relationship. 

We also queried the PFAM database of protein 
domains using the candidate non-MYB motifs. With 
the exception of motif 2 2, none of the 34 conserved 
motifs corresponded to known domains. Motif 
22, shared by members of the second clade of the 
TBP-like subgroup, showed significant homology with 
linker histones H1 and H5, which bind the nucleosome 
as a major component of chromatin and play a role in 
chromatin dynamics. 27 In addition, in the same clade, 
we identified a region near the C-terminus that may 
form a coiled-coil domain (motif 23). Such domains, 
found in many TFs, are predicted to stabilize protein 
dimer formation. 28 The presence of motifs 22 and 23 
verified our phylogenetic classification and suggested 
a specific role for this type of MYB-related protein. 
Motifs 5-7 formed the first MYB repeat of R-R proteins, 
demonstrating that the first repeat is less homologous 
to the typical MYB domain than the second repeat. 

The conservation of these additional motifs demon- 
strates that the diversity of domain architecture has 
been maintained beyond the core components of 
the MYB domain, while the presence of clade- 
specific motifs indicates their recent common origin. 
Therefore, they may be essential for the function of 
MYB-related proteins. 



3 . 7. Expression analysis of MYB-related genes 
at different developmental stages 

To understand the temporal and spatial expression 
patterns of MYB-related genes, we compared their 
expression patterns during maize and soybean 
development. 

Microarray data of 60 different tissues/developmen- 
tal conditions of maize 29 were used (Fig. 5). Few genes 
were constitutively expressed in all organs and develop- 
mental stages. CCA-like/R-R genes were expressed in 
most organs examined, with the exception of seeds. 
However, six genes in a CCA1 -like/R-R subgroup clade 
(ZmMYBR02, ZmMYBRl 1 , ZmMYBR34, ZmMYBR42, 
ZmMYBR6 5, and ZmMYBR67) showed higher expression 
inseedsthan in otherorgans, which indicated that they 
may play important roles in seed development (Fig. 5). 
Similar to CCA1 -like/R-R proteins, the l-box-like genes 
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Figure 5. Expression profiles of MYB-related genes in maize across different developmental stages and organs. The genes and their 
corresponding intron patterns are on the right. The tissues used for expression analysis are indicated at the top of each column. The 
colour bar represents log2 expression values. 



were also expressed abundantly in many maize organs. 
This may indicate that these two subgroups predomin- 
antly contribute to maize development. The expression 
of l-box-like genes and mostoftheCCA! -like/R-Rgenes 
significantly decreased during seed development, 
further implying roles as negative regulators in seed de- 
velopment. A CPC-like gene, ZmMYBR20, was highly 
expressed in leaf tissues, which suggested that it may 
function in leaf development, or it may be restricted 
by 2R-MYB genes in other developmental stages 
(see below). 



The TRF-like subgroup included only two maize 
genes, which showed relatively high expression in 
seeds. Although no TRF-like genes have yet been func- 
tionally characterized in plants, their preferential ex- 
pression in maize seed tissues implies their possible 
roles in seed development. The TBP-like subgroup, con- 
sisting of five clades, contained 14 maize genes. 
Although members ofthissubgroupdisplayed relatively 
low expression in all examined organs, TBP-like genes 
have wider expression in maize (Fig. 5). Furthermore, 
closely related genes generally showed highly similar 



444 



Origin and Diversification of MY B- Related Proteins in Plants 



[Vol. 20, 



GSE16567 



B 



GSE40052 



GSE19501 



GSE 10023 GSE31188 



£ £ 




ZmMYBR07/i1 
ZmMYBR45/k 
ZmMYBRI 9/d 
ZmMYBR56/d 
ZmMYBR26/i1 
ZmMYBR47/i1 
ZmMYBR41/g 
ZmMYBR55/i1 
ZmMYBR31/i1 
ZmMYBR28/b 
ZmMYBR49/b 
ZmMYBR05/a 
ZmMYBR30/e 
ZmMYBR22/k 
ZmMYBR33/k 
ZmMYBR36/a 
ZmMYBR50/c 
ZmMYBRI 0/c 
ZmMYBR29/i1 
ZmMYBR34/b 
ZmMYBR46/d 
ZmMYBR70/a 
ZmMYBRI 3/a 
ZmMYBR35/j 
ZmMYBR64/a 
ZmMYBR44/b 
ZmMYBR71/d 
ZmMYBR27/a 
ZmMYBR37/e 
ZmMYBR63/b 
ZmMYBR03/b 
ZmMYBRI 8/b 



_ | 

™ re re re «iS ciKBfflCDf2(ufljnj(D<Sflj(ij(crecfti 

0<-(N»f.Oi-(NWNO,-rN^^O^(NWNOu.< 



; 0.0.0.0 ° ooc 

J (O (O (O (O 




\3> 



V 



ZmMYBR35/j 

ZmMYBR07/i1 

ZmMYBR22/k 

ZmMYBR33/k 

ZmMYBR36/a 

ZmMYBR28/b 

ZmMYBR49/b 

ZmMYBR26/i1 

ZmMYBR47/i1 

ZmMYBR45/k 

ZmMYBR41/g 

ZmMYBRI 9/d 

ZmMYBR56/d 

ZmMYBR29/i1 

ZmMYBR70/a 

ZmMYBRI 3/a 

ZmMYBR46/d 

ZmMYBR05/a 

ZmMYBR30/e 

ZmMYBR44/b 

ZmMYBR31/i1 

ZmMYBR64/a 

ZmMYBR34/b 

ZmMYBR37/e 

ZmMYBRI 0/C 

ZmMYBR50/c 

ZmMYBR71/d 

ZmMYBR 55/11 

ZmMYBR63/b 

ZmMYBR03/b 

ZmMYBRI 8/b 

ZmMYBR27/a 



Figure 6. Expression profiles of maize MYB-related genes in response to drought stress or fungal infection. (A) The expression profiles of maize 
MYB-related genes under drought stress. (B) The expression of maize MYB-related genes after fungal infection. 



expression patterns, indicating that they may share 
similar or overlapping functions. 

We next analysed the expression profiles of 
soybean MYB-related genes. 30 The majority of the 
1 27 soybean MYB-related genes showed wide expres- 
sions in the examined tissues. However, 2 2 soybean 
genes were not expressed in this dataset, suggesting 
that they might be pseudogenes. In most of the cases, 
the expression patterns of MYB-related genes in maize 
and soybean were very similar (Supplementary Fig. 
S2). The expression patterns of soybean genes divided 
into two main groups. Most of the CCA1 -like/R-R 
genes showed prominent responses in the early stage 
of soybean development, while some TBP-like genes 
were expressed at higher levels in leaves and seeds. 
There were also minor differences in the expression pat- 
terns of l-box-like and CPC-like genes between maize 
and soybean. Some soybean l-box-like and CPC-like 
genes showed higher expression in more tissues, 
suggesting that these genes may play wider roles in 
soybean. The high similarity of MYB-related gene ex- 
pression in maize and soybean indicates functional 
conservation of this gene family in plants. 



3.8. Expression analysis of MYB-related genes 
under biotic and abiotic stresses 
We further examined the roles of maize MYB-related 
genes in drought stress, based on the microarray data 1 9 
(Fig. 6A). Twenty-six probe sets on the maize 1 8k 
GeneChip corresponded to 32 MYB-related genes 
(five probes represented more than one gene). The 
high sequence similarity necessitated further experi- 
mental confirmation. Most of the maize MYB-related 
genes were expressed at low levels, but were prefe- 
rentially expressed under specific stress conditions. 
The expressions were very similar in tolerant (Han21 ) 
and sensitive (Ye47 8) lines. Among the genes, 
four CCA1 -like/R-R genes (ZmMYBRI 9, ZmMYBR28, 
ZmMYBR49, and ZmMYBR56), one TRF-like gene 
(ZmMYBR41), and six TBP-like genes (ZmMYBR07, 
ZmMYBR26, ZmMYBR3 7 , ZmMYBR45, ZmMYBR47, and 
ZmMYBR55) increased in response to drought 
stress. In contrast, the expression of five CCA1-like/ 
R-R genes (ZmMYBR03, ZmMYBRI 8, ZmMYBR2 7, 
ZmMYBR44, and ZmMYBR63) and one l-box-like gene 
(ZmMYBR37) was significantly down-regulated by 
drought stress and recovered after re-watering. Thus, 
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MYB-related genes likely contribute to the drought re- 
sponse. 

To explore the roles of maize MYB-related genes in 
the response to pathogens, we investigated their 
expressions after the treatment with Sphacelotheca 
reiliana, Fusarium moniliforme, Ustilago maydis, or 
Colletotrichum graminicola.^ 9 As shown in Fig. 6B, the 
majority of maize genes analysed were differentially 
expressed over time after inoculation with these four 
pathogens. In general, the genesshowedsimilarexpres- 
sion patterns in response to each of the pathogens. For 
example, ZmMYBR05, ZmMYBR45, and ZmMYBR56 
were up-regulated after infection with the four patho- 
gens. However, some MYB-related genes also showed 
different expression patterns in different lines in re- 
sponse to the same pathogen. Furthermore, the expres- 
sion of maize MYB-related genes varied more with time 
after U. maydis infection. Taken together, our results 
showed that MYB-related genes might participate in 
the maize pathogen response. 

We used the lllumina transcriptome sequencing 
data 31 to assess the expressions of soybean MYB- 
related genes under pathogen stress (Supplementary 
Fig. S3 A). Most of the soybean MYB-related genes were 
induced after infection with Bradyrhizobium japonicum. 
Moreover, the expression patterns differed significantly 
between root hair and stripped root samples: many 
genes exhibited much higher expression in root hairs 
than in stripped roots. In most of the cases, the 
soybean genes were differentially up-regulated upon 
B. japonicum infection. However, some genes were 
reduced. To extend the expression analysis of soybean 
MYB-related gene, we used the Affymetrix array data 
housed within the PLEXdb. 20 Thirty-two probes corre- 
sponded to individual soybean genes, of which 22 
matched 2 genes and 1 matched 3 genes. Most of the 
soybean genes were strongly induced in hypocotyls 
infected with Phytophthora sojae (Supplementary Fig. 
S3B), which suggests these genes also contribute to 
the pathogen response. In addition, a few soybean 
genes, such as GmMYBR028 and GmMYBRI 26, were 
up-regulated after infection with aphids, indicating a 
possible function. 

3.9. Evolution and divergence of MYB-related proteins 

Our phylogenetic analysis allowed us to assess the 
origin and evolutionary relationships among different 
subgroups. In the CCA1 -like/R-R subgroup, the inclu- 
sion of all five chlorophyte algae and the red alga 
implies that this subgroup predates the divergence of 
red algae from the ancestor of land plants 1.5 billion 
yrs ago. 32 This subgroup contained four major clades 
(Fig. 2). The first two clades were characterized by the 
location of the MYB domain and the presence of motif 
1 adjacent to the C-terminus of the MYB domain; 



these two clades were further distinguishable by a 
number of clade-specific motifs (Fig. 4). Similar results 
were also observed for the other two clades. 
Interestingly, all four clades included chlorophyte 
algae proteins, suggesting that they differentiated 
from a common ancestor before the origin of land 
plants. Clades III and IV appeared to be relatively older 
since they clustered with several red algae proteins 
(Fig. 2). 

In contrast, the l-box-like subgroup seems to have 
evolved recently in angiosperms. No obvious ortholo- 
gues were detected in algae, moss, or Selaginella. 
Although the MYB domains of different subgroups 
were generally quite divergent, those of l-box-like pro- 
teins were highly homologous (~40% identity) to the 
first MYB repeats of R-R proteins. One significant differ- 
ence was an amino acid deletion in l-box-like proteins 
(Supplementary Fig. S4). Consistent with a previous 
study, 25 when both MYB repeats of R-R proteins were 
used in the phylogenetic analysis, the second repeats 
clustered within the CCA1 -like/R-R subgroup, while 
the first repeats clustered within the l-box-like sub- 
group with high bootstrap values (Supplementary Fig. 
S5). These results imply that l-box-like proteins 
evolved from R-R proteins through the gene disruption 
among the angiosperms ~415 million yrs ago. 33 
During this process, l-box-like proteins likely evolved 
from the first repeats in R-R proteins, while the second 
repeats formed the first clade of the CCA1 -like/R-R sub- 
group. This view is supported by the conserved intron 
patterns of l-box-like genes and the first MYB repeats 
of R-R genes (both of which are intronless). 

The TBP-like subgroup is composed of MYB-related 
proteins from chlorophyte algae and land plants, sug- 
gesting that it is >1 billion yrs old. 34 Among its five 
major clades, the second, third, and fourth clades are 
likely the oldest, because they contain algae proteins. 
While the second and fourth clades of this subgroup 
share the same intron pattern (i), the positions of 
their MYB domains differ (Fig. 4). Since both of these 
clades include algae proteins, domain shifting could 
have occurred earlier in plants. Similar results were 
also found in the CCA1 -like/R-R subgroup (Fig. 4). 

The CPC-like subgroup includes two major clades. 
Clade II composed of V. carteri and C. reinhardtii pro- 
teins and is sister to Clade I of angiosperm proteins 
(Fig. 2). Recently, we found that some soybean 2R- 
MYB genes are alternatively spliced, resulting in a 
change from 2R-MYB to R3-MYB. 1 1 Alignment analysis 
showed that the MYB domains of angiosperm CPC-like 
proteins had significant homology to the R3 repeats 
of 2R-MYB proteins. Both sequences contain a con- 
served motif, [DE]Lx2[RK]x3Lx6Lx3R (Fig. 3), which 
specifies the interaction with bHLH proteins. 35,36 
Phylogenetic tree analysis showed that such R3-MYB 
proteins clustered within the CPC-like subgroup (data 
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notshown). However, we did notfind this motif inalgae 
CPC-like proteins or 2R-MYB proteins. Further phylo- 
genetic analysis revealed that the algae CPC-like pro- 
teins and R3 repeats of 2R-MYB proteins also 
clustered within a clade (data not shown). This result 
suggests that the algae CPC-like proteins originated 
from 2R-MYB proteins after the divergence from a 
common ancestor of land plants. The absence of the 
[DE]Lx2[RK]x3Lx6Lx3R motif in algae CPC-like and 
2R-MYB proteins, as well as in lower land plants, 
implies that the interaction between MYB and bHLH 
proteins may be angiosperm specific. The TRF-like sub- 
group included proteins from angiosperms and moss, 
but not from lycophytes, suggesting a loss of these 
genes from lycophytes. Though the subgroup is very 
small, it is >443 million yrsold. 37 

Our results also showed a gradual increase in the 
number of MYB-related genes from moss to flowering 
plants (Fig. 1 ). This finding suggests the evolutionary di- 
versification of MYB-related proteins through extensive 
expansion during plant evolution. This expansion 
appears to have occurred in three important stages. 
The first stage likely predated the origin of red algae 
and led to the establishment of CCA1 -like/R-R and 
TBP-like proteins, containing the motifs (SHAQK(Y/F) 
F and LKDKW(R/K)(N/T), respectively. The second 
stage may have occurred at the early origin of land 
plants to establish the diversity of MYB domains, 
intron patterns, and non-MYB motifs. The third stage 
may have occurred after the split between gymnos- 
perms and angiosperms, as reflected in the greater 
size in angiosperms within each subgroup (Fig. 2). 
Despite several rounds of gene duplications and loss in 
different plant lineages, these subgroups have 
remained highly conserved throughoutthe plantevolu- 
tion. Chromosomal distribution analysis revealed that 
MYB-related genes were distributed throughout all cor- 
responding chromosomes in each species (data not 
shown). Compared with 2R-MYB proteins, 1 0,11 we 
detected fewer tandem duplication events in the 
MYB-related gene family, which suggests that its 
major expansion is genome-wide duplication. 

Furthermore, CCA1 -like and TBP-like genes are also 
present in other eukaryotes, including fungi and 
metazoans (data not shown). Taken together, our 
results indicate that CCA1 -like and TBP-like genes are 
much older than previously thought. 



3.7 0. Functional diversity of MYB-related genes 

Putative orthologues in each subgroup and/or clade 
indicate conserved physiological functions. Therefore, 
we performed a comparative phylogenetic analysis of 
Arabidopsis and other well-known plant MYB-related 
proteins (Supplementary Fig. S6). Supplementary 
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Table S4 summarizes the functions of plant MYB- 
related genes. 

CCA-1 like/R-R genes are best known for their in- 
volvement in circadian rhythm regulation, and are 
highly conserved in plants (Supplementary Table S4), 
which implies that the functional divergence of clock 
machinery occurred before the divergence of land 
plants. Consistently, in the analysis of PLEXdb micro- 
array data, 19 we identified AtCCAl gene homologues 
in maize and soybean that are involved in circadian 
rhythmicity (Supplementary Fig. S7A). In our study, 
CCA1-like genes involved in circadian rhythmicity 
divided into two clades with different intron patterns 
(cand d).This may explain why they have similarfunc- 
tions but through different mechanisms. Moreover, we 
observed relatively high expression of MYB-related 
genes in the flower tissues of maize and soybean, con- 
sistent with a role in regulating floral development. 38 
Taken together, these results indicate the functional di- 
versity and conservation of CCA1 -like/R-R proteins and 
theircrucial roles in plant development. 

The cooperative interaction between MYB and bHLH 
TFs is a classical example of combinatorial regulation. 
Two types of MYB TFs, 2R-MYB (WER) and CPC-like (or 
R3-MYB), have similar functions in cell-fate determin- 
ation. 39,40 Both of which contain the conserved motif 
DLx2Rx3Lx6Lx3R in their MYB domains, which is 
involved in MYB-bHLH interactions 35 (Fig. 3). CPC- 
like proteins can interact with bHLH proteins, thereby 
competing with the 2R-MYB protein in the regulation 
of plant development. 41 In the present study, one 
soybean CPC-like gene, GmMYBR79, was expressed in 
the soybean root hair (Supplementary Fig. S3A), imply- 
ing a similar role in soybean hair development. 
Recently, CPC-like genes were shown to down-regulate 
anthocyanin synthesis by a similar mechanism. 42 In our 
expression analysis, one maize (ZmMYBR20) and two 
soybean (GmMYBR78 and GmMYBR80) CPC-like 
genes showed high expression in flower tissues (Fig. 5 
and Supplementary Fig. S2), suggesting that they may 
regulate anthocyanin synthesis via similar mechanisms. 
These results demonstrate the close relationship 
between CPC-like proteins and 2R-MYB proteins and 
the functional conservation of this motif during the 
evolution. 

Members of the l-box-like subgroup are also key de- 
velopmental regulators in various plant tissues 
(Supplementary Table S4). Consistently, our results 
showed that l-box-like genes have broad expression 
profiles in maize and soybean (Fig. 5 and 
Supplementary Fig. S2). To date, the most well-known 
role of l-box-like genes is the regulation of floral asym- 
metry. 43,44 Interestingly, the l-box-like genes not only 
showed high homology and similarexpression patterns 
with R-R-like genes, but also appeared to have similar 
functions in flower development. This further 



Origin and Diversification of MYB-Related Proteins in Plants 



No. 5] 



H. Du etal. 



447 



demonstrates that l-box-like genes evolved from R-R- 
like genes and maintained functions similar to those 
of R-R-like proteins. A relatively small number of TBP- 
like genes have been functionally characterized 
(Supplementary Table S4). Most of the known TBP- 
like genes encode telomere-binding proteins. 8 The 
strong divergence of this subgroup implies that its 
members might have additional diverse functions. 

Supplementary data: Supplementary data are 
available at www.dnaresearch.oxfordjournals.org. 
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